Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

find UW star files avoiding hard-coded names #91

Merged
merged 1 commit into from
Apr 24, 2024

Conversation

jchiang87
Copy link
Collaborator

This samples the desired healpixel, computing the corresponding HTM index at each point, and adds that corresponding file to the list.

if self._files:
# The files have already been indexed.
return
files = sorted(glob.glob(os.path.join(input_dir,
Copy link
Collaborator

@JoanneBogart JoanneBogart Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A description of the format for input file basenames would be helpful, either in a comment or use a regular expression to pick out fields of interest, which can be named, e.g.
pat = "stars_chunk_(?P<min_htm_id>\d+)_(?P<max_htm_id>\d+).parquet"
Then lines 25-28 below get replaced with
m = re.match(pat, os.path.basename(item))
self._files[(m['min_htm_id'], m['max_htm_id'])] = item

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike regular expressions, but if u really think this is better and generally easier for people to understand (I don't particularly), I can change it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the regular expression clearer; I don't know whether most people do or not. Adding a comment describing the structure of the basename instead is fine with me.


return [os.path.join(dirpath, fname) for fname in fnames]
def find_files(self, pixel, nside=32, res_factor=16):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A docstring for this routine would be welcome. Why was 16 chosen as default for res_factor?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I'll add a comment. resfactor 16 is obviously the resfactor used by those files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does resfactor have to do with HTM depth (apparently 20), if anything?
Healpixels with nside=32 are much bigger than HTM depth 20 regions (or depth 16 regions, for that matter). How did you verify that nside*16 gives a sufficiently fine grid?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to do this (which seems safer to me) would be to define a minimum circle around the healpixel, then call HtmIndexer.getShardIds

Copy link
Collaborator

@JoanneBogart JoanneBogart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ok. There are a couple places noted in inline comments which could use a little more explanation.

@jchiang87 jchiang87 force-pushed the u/jchiang/find_UW_star_files branch from 1cc74ee to 4853818 Compare April 24, 2024 19:33
@jchiang87 jchiang87 merged commit bec65b9 into main Apr 24, 2024
1 check passed
@jchiang87 jchiang87 deleted the u/jchiang/find_UW_star_files branch April 24, 2024 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants